On the Acceleration of Wavefront Applications using Distributed Many-Core Architectures
نویسندگان
چکیده
In this paper we investigate the use of distributed GPU-based architectures to accelerate pipelined wavefront applications – a ubiquitous class of parallel algorithm used for the solution of a number of scientific and engineering applications. Specifically, we employ a recently developed port of the LU solver (from the NAS Parallel Benchmark suite) to investigate the performance of these algorithms on high-performance computing solutions from NVIDIA (Tesla C1060 and C2050) as well as on traditional clusters (AMD/InfiniBand and IBM BlueGene/P). Benchmark results are presented for problem classes A to C and a recently developed performance model is used to provide projections for problem classes D and E, the latter of which represents a billion-cell problem. Our results demonstrate that while the theoretical performance of GPU solutions will far exceed those of many traditional technologies, the sustained application performance is currently comparable for scientific wavefront applications. Finally, a breakdown of the GPU solution is conducted, exposing PCIe overheads and decomposition constraints. A new k-blocking strategy is proposed to improve the future performance of this class of algorithm on GPU-based architectures.
منابع مشابه
Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کاملWavefront/Systolic Algorithms for Implementation of Stereo Vision and Obstacle Avoidance Computations on a Very Low Power MIMD Many-Core Parallel Architecture: Applications for Mobile Systems and Wearable Visual Guidance"
© 2012 Diotalevi et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Wavefront/Systolic Algorithms for Implementation of Stereo Vision and Obstacle Avo...
متن کاملParallel performance results for the OpenMOC neutron transport code on multicore platforms
The shift towards multi-core architectures has ushered in a new era of shared memory parallelism for scientific applications. This transition has introduced challenges for the nuclear engineering community as it seeks to design high-fidelity full-core reactor physics simulation tools. This paper describes the parallel transport sweep algorithm in the OpenMOC method of characteristics (MOC) neut...
متن کاملDISTRIBUTED TWO - DIMENSIONAL FOURIER TRANSFORMS ON DSPs : WITH AN APPLICATIONS FOR PHASE RETRIEVAL
Title of Document: DISTRIBUTED TWO-DIMENSIONAL FOURIER TRANSFORMS ON DSPs: WITH APPLICATIONS FOR PHASE RETRIEVAL Jeffrey Scott Smith, Master of Science, 2006 Directed By: Professor Bruce Jacob Department of Electrical Engineering Many applications of two-dimensional Fourier Transforms require fixed timing as defined by system specifications. One example is image-based wavefront sensing. The ima...
متن کاملEfficient Execution of Irregular Wavefront Propagation Pattern on Many Integrated Core Architecture
The efficient execution of image processing algorithms is an active area of Bioinformatics. In image processing, one of the classes of algorithms or computing pattern that works with irregular data structures is the Irregular Wavefront Propagation Pattern (IWPP). In this class, elements propagate information to neighbors in the form of wave propagation. This propagation results in irregular acc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Comput. J.
 
دوره 55 شماره
صفحات -
تاریخ انتشار 2012